.. _`Logistic Regression`: .. _`org.sysess.sympathy.machinelearning.logisticregression`: Logistic Regression ~~~~~~~~~~~~~~~~~~~ .. image:: logistic_regression.svg :width: 48 Logistic regression of a categorical dependent variable **Documentation** Logistic regression of a categorical dependent variable *Configuration*: - *penalty* Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver. If 'none' (not supported by the liblinear solver), no regularization is applied. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) - *dual* Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features. - *C* Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. - *fit_intercept* Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. - *intercept_scaling* Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased. - *class_weight* Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 *class_weight='balanced'* - *tol* Tolerance for stopping criteria. - *multi_class* If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'. 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial'. .. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. .. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22. - *max_iter* Maximum number of iterations taken for the solvers to converge. - *solver* Algorithm to use in the optimization problem. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones. - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes. - 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty - 'liblinear' and 'saga' also handle L1 penalty - 'saga' also supports 'elasticnet' penalty - 'liblinear' does not support setting ``penalty='none'`` Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. - *n_jobs* Number of CPU cores used when parallelizing over classes if multi_class="ovr". Ignored when the solver is set to "liblinear" regardless of multi_class. If given -1 then all cores are used - *random_state* Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See random_state for details. - *warm_start* When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See warm_start. .. versionadded:: 0.17 *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers. *Attributes*: - *n_iter_* Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given. .. versionchanged:: 0.20 In SciPy <= 1.0.0 the number of lbfgs iterations may exceed ``max_iter``. ``n_iter_`` will now report at most ``max_iter``. - *coef_* Coefficient of the features in the decision function. `coef_` is of shape (1, n_features) when the given problem is binary. In particular, when `multi_class='multinomial'`, `coef_` corresponds to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False). - *intercept_* Intercept (a.k.a. bias) added to the decision function. If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape (1,) when the given problem is binary. In particular, when `multi_class='multinomial'`, `intercept_` corresponds to outcome 1 (True) and `-intercept_` corresponds to outcome 0 (False). *Input ports*: *Output ports*: **model** : model Model **Definition** *Input ports* *Output ports* :model: model Model .. automodule:: node_regression :noindex: .. class:: LogisticRegression :noindex: